The ACQUILEX LKB: representation issues in semi-automatic acquisition of large lexicons

نویسنده

Ann A. Copestake

چکیده

We describe the lexical knowledge base system (LKB) which has been designed and implemented as part of the ACQUILEX project 1 to allow the representation of multilinguM syntactic and semantic information extracted from machine readable dictionaries (MRDs), in such a way that it is usable by natural language processing (NLP) systems. The LKB's lexical representation language (LRL) augments typed graph-based unification with default inheritance , formalised in terms of default unification of feature structures. We evaluate how well the LRL meets the practical requirements arising from the semi-automatic construction of a large scale, multilingual lexicon. The system as described is fully implemented and is being used to represent substantial amounts of information automatically extracted from MRDs. 1 Introduction The ACQUILEX LKB is designed to support representation of multilingual lexical information extracted from machine readable dictionaries (MRDs) in such a way that it can be utilised by NLP systems. In contrast to lexical database systems (LDBs) or thesaurus-like representations 1988) which represent extracted data in such a way as to support browsing and querying, our goal is to build a knowledge base which can be used as a highly struc-tured reusable lexicon, albeit one much richer in lexical semantic information than those commonly used in NLP. Thus, although we are using information which has been derived from MRDs (possibly after considerable processing involving some human intervention), our aim is not to represent the dictionary entries themselves. Our methodology is to store the dictionary entries and raw extracted data in our LDB (Carroll, 1990) and to use this information to build LKB entries which could be directly utilised by an NLP system. Briscoe (1991) discusses the LDB/LKB distinction in more detail and describes the ACQUILEX project as a whole. 1'The Acquisition of lexical knowledge for Natural Language Processing systems' (Esprit BRA-3030) Practical NLP systems need large lexicons. Even in cases such as database front ends, where the domain of the application is highly restricted, a practical natural language interface must be able to cope with an extensive vocabulary, in order to respond helpfully to a user who lacks domain knowledge, for example. For applications such as text-to-speech synthesis, interfaces to large-scale knowledge based systems, summarising and so on, large lexicons are clearly needed; for machine translation the requirement is for a large scale, multilingua] lexical resource. Acquisition of such information is a serious bottleneck in building NLP systems, and MRD sources currently …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Translation Equivalence and Lexicalization in the ACQUILEX LKB

We propose a strongly lexicalist treatment of translation equivalence where mismatches due to diverging lexicalization patterns are dealt with by means of translation links which capture crosslinguistic generalizations across sets of semantically related lexical items. We show how this treatment can be developed within a unification-based, multilingual lexical knowledge base which is integrated...

متن کامل

Automatically extracting Translation Links using a wide coverage semantic taxonomy

TGE (Tlink Generator Environment) is a system for semi-automatically extracting translation links. The system was developed within the ACQUILEX II2 project as a tool for supporting the construction of a multi-lingual lexical knowledge base containing detailed syntactic and semantic information from MRD resources. A drawback of the original system was the need of human intervention for selecting...

متن کامل

Multilingual Lexical Representation

The approach to multilingual lexical representation developed as part of the ACQUILEX Lexical Knowledge Base (LKB) discussed with specific reference to complex translation equivalence. The treatment described provides a lexicalist account of translation mismatches in terms of translation links which capture cross-linguistic generalizations across sets of semantically related lexical items, and ...

متن کامل

Semi-automatic Acquisition of Domain-speciic Translation Lexicons

We investigate the utility of an algorithm for translation lexicon acquisition (SABLE), used previously on a very large corpus to acquire general translation lexicons , when that algorithm is applied to a much smaller corpus to produce candidates for domain-speciic translation lexicons.

متن کامل

An Overt Semantics With A Machine-Guided Approach For Robust LKBs

In this paper, we report on our experience in buildmg computational semantic lexicons for use in NLP applications In a machine-graded approach, the computer reduces part of the semantic knowledge to be acquired by an acqulrer An overt semantics can help predict the syntactic behavior of words By overt semantics we mean applying the hnkmg or lexlcal rules at the semantic level and not on lexlcal...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1992

The ACQUILEX LKB: representation issues in semi-automatic acquisition of large lexicons

نویسنده

چکیده

منابع مشابه

Translation Equivalence and Lexicalization in the ACQUILEX LKB

Automatically extracting Translation Links using a wide coverage semantic taxonomy

Multilingual Lexical Representation

Semi-automatic Acquisition of Domain-speciic Translation Lexicons

An Overt Semantics With A Machine-Guided Approach For Robust LKBs

عنوان ژورنال:

اشتراک گذاری